### Replication Material for: 
### Journal: Research & Politics
### Article: The Manifesto Corpus: a New Resource for Research on Political Parties and Quantitative Text Analysis
### Authors: Nicolas Merz, Sven Regel, Jirka Lewandowski


This file describes how all tables and figures used in the article can be reproduced. 

Replication is only possible when the statistical software R is installed. Moreover, the companion R package "manifestoR" is required.
To install the package from CRAN, execute the following line:
   
   install.packages("manifestoR")

In order to connect to the server and access the Manifesto Corpus an API key is necessary. 
You need to register on the Manifesto Project's website to generate your individual API key: 

Proceed as following:
- Go to https://manifesto-project.wzb.eu 
- register, 
- login,
- go to your profile page,
- click "generate api key".

Download the key file and place it in the same directory as the script files. 

You can set the api-key with the following command. 

mp_setapikey(key=<key>)
or
mp_setapikey(key.file="manifesto_apikey.txt")

This command is also included at the begin of every R-script file of this replication material.

Additionally, you might have to install some more packages as some of the scripts depend on at least one of the following packages: dplyr, ggplot2, magrittr, NLP, openNLP, openNLPmodels.de, purrr, RTextTools, SnowballC, stringi, stringr, tables, wordcloud. Most of these packages can be installed directly via CRAN, but openNLPmodels.de might have to be installed directly from "http://datacube.wu.ac.at/".

install.packages(c("dplyr", "ggplot2", "magrittr", "NLP", "openNLP", "purrr", "RTextTools", "SnowballC", "stringi", "stringr", "tables", "wordcloud"))
install.packages("openNLPmodels.de", repos = "http://datacube.wu.ac.at/", type = "source")

Set the main directory with the command setwd() to your working directory in R, otherwise the R scripts will not work properly. The scripts draft_compare.R and classification.R might require some patience (can take some hours) as they are computationally intensive. 


List of files and their purpose:

* clean.R removes all files that are generated by the following scripts
* coverage.R produces the coverage table (Table 1)
* wordcloud.R generates the comparison wordcloud (Figure 1)
* dictionary.R produces the table with the most unique word stems in many different languages (Table 2)
* draft_compare.R produces Figure 2 and Figure 3 - it requires the draft versions from the input folder
* classification.R does the semi-automaic classification task and produces Figure 4
* greens.txt (draft manifesto of the green party for the 2013 elections - necessary for the text reuse application)
* spd_entwurf.txt (draft manifesto of the spd for the 2013 elections - necessary for the the text reuse application)

The following files are created by the scripts:
* greens.csv - processed data produced by draft_compare.R
* spd.csv - procesed data produced by draft_compare.R
* auto-scatter.pdf (Figure 4, generated by classification.R)
* cloud.pdf (Figure 1, generated by wordcloud.R)
* gruene_comparison.pdf (one part of Figure 3, generated by draft_compare.R)
* gruene_density.pdf (one part of Figure 2, generated by draft_compare.R)
* spd_comparison.pdf (one part of Figure 3, generated by draft_compare.R)
* spd_density.pdf (one part of Figure 2, generated by draft_compare.R)
* dictionary.tex (tex file of Table 2, generated by dictionary.R)
* coverage.tex (tex file of Table 1, generated by coverage.R)

